In the next paragraphs, one should note that the graphical
representations report probabilities of correct repetition of a nonword,
i.e., in the ‘numerical space’ of the predicted variable. However, the
statistical tests of the various effects consider the non-transformed
estimates and confidence intervals, i.e. in the numerical space of the
linear combination of predictors / independent variables.
Investigating the
interactions
We can first investigate the higher-level predictors of the model,
that is the different interactions.
- occurrence_l * LoE
- branching_onset * LoE
- rec_voc * LoE
- age * verbal_stm
- age * rec_voc
- verbal_stm * rec_voc
The first two interactions are between one categorical variable and
the continuous variable LoE
(occurrence_l : LoE and
branching_onset : LoE)
We first define a range of variation for the values of LoE
list.LoE <- list(LoE = seq(min(df_reduced$LoE), max(df_reduced$LoE), by = 1))
For occurrence_l : LoE:
plot_model(model_rep, type = "emm", terms = c("LoE [all]", "occurrence_l"))

emtrends(model_rep, pairwise ~ occurrence_l | LoE, var= "LoE", adjust = "mvt", infer = c(T,T))
$emtrends
LoE = 184:
occurrence_l LoE.trend SE df asymp.LCL asymp.UCL z.ratio p.value
coda 0.00119876 0.00343 Inf -0.00553 0.00793 0.349 0.7270
final 0.00352791 0.00335 Inf -0.00303 0.01008 1.055 0.2916
other 0.00000632 0.00145 Inf -0.00284 0.00285 0.004 0.9965
Results are averaged over the levels of: branching_onset, V
Confidence level used: 0.95
$contrasts
LoE = 184:
contrast estimate SE df asymp.LCL asymp.UCL z.ratio p.value
coda - final -0.00233 0.00439 Inf -0.01251 0.00785 -0.531 0.8503
coda - other 0.00119 0.00285 Inf -0.00541 0.00780 0.419 0.9039
final - other 0.00352 0.00348 Inf -0.00456 0.01160 1.011 0.5572
Results are averaged over the levels of: branching_onset, V
Confidence level used: 0.95
Conf-level adjustment: mvt method for 3 estimates
P value adjustment: mvt method for 3 tests
The figures show that the slopes for the levels
other and coda are quite close, and
are visually quite different from the slope for the third level
final. The statistical tests, however, fail to detect a
significant difference. The reason for that is possibly the large
standard errors for coda and final,
which are due in turn to the small number of nonwords which relate to
these two levels (4 and 7 respectively, to be compared to 60 nonwords
without l in coda or final position):
my_table <- df_reduced %>%
dplyr::select(nonword, occurrence_l) %>%
unique() %>%
with(., table(occurrence_l)) %>%
as.data.frame()
colnames(my_table) <- c("Occurrence of l", "# nonwords")
my_table
Occurrence of l # nonwords
1 coda 4
2 final 7
3 other 60
The z ratios are overall small, which might give us some
confidence that there is no false negative here despite the possibly
inflated standard errors due to multicollinearity.
For branching_onset : LoE:
plot_model(model_rep, type = "emm", terms = c("LoE [all]", "branching_onset"))

emtrends(model_rep, pairwise ~ branching_onset | LoE, var= "LoE", adjust = "mvt", infer = c(T,T))
$emtrends
LoE = 184:
branching_onset LoE.trend SE df asymp.LCL asymp.UCL z.ratio p.value
0 0.002248 0.00184 Inf -0.00136 0.00586 1.221 0.2222
1 0.000395 0.00198 Inf -0.00348 0.00427 0.200 0.8418
2 0.002090 0.00312 Inf -0.00403 0.00821 0.669 0.5034
Results are averaged over the levels of: occurrence_l, V
Confidence level used: 0.95
$contrasts
LoE = 184:
contrast estimate SE df asymp.LCL asymp.UCL z.ratio p.value
branching_onset0 - branching_onset1 0.001853 0.00116 Inf -0.000809 0.00452 1.603 0.2303
branching_onset0 - branching_onset2 0.000157 0.00266 Inf -0.005962 0.00628 0.059 0.9980
branching_onset1 - branching_onset2 -0.001696 0.00264 Inf -0.007774 0.00438 -0.642 0.7874
Results are averaged over the levels of: occurrence_l, V
Confidence level used: 0.95
Conf-level adjustment: mvt method for 3 estimates
P value adjustment: mvt method for 3 tests
While the figures show that the slopes for LoE
differ according to the value of branching_onset, the
statistical tests reveal that these slopes are actually not
significantly different from each other.
We then have 4 interactions which consist of two continuous variables
: rec_voc : LoE, age
: verbal_stm, months :
rec_voc and verbal_stm :
rec_voc
For rec_voc : LoE:
plot_model(model_rep, type = "emm", terms = c("LoE [all]", "rec_voc"))

Knowing that the interaction between rec_voc and
LoE corresponds to the change in the slope of
LoE for every one unit increase in
rec_voc (or vice-versa), we can assess the significance
of this interaction by looking at the contrast / difference between two
slopes of rec_voc separated by one unit increase in
LoE (the result will be independent from the choice of
the two values separated by one unit).
list.LoE.red <- list(LoE = seq(median(df_reduced$LoE) - 0.5, median(df_reduced$LoE) + 0.5, by = 1))
emtrends(model_rep, pairwise ~ LoE, var = "rec_voc", at = list.LoE.red, adjust = "mvt", infer = c(T, T))
$emtrends
LoE rec_voc.trend SE df asymp.LCL asymp.UCL z.ratio p.value
184 0.0549 0.0206 Inf 0.0146 0.0953 2.669 0.0076
184 0.0552 0.0207 Inf 0.0147 0.0957 2.673 0.0075
Results are averaged over the levels of: occurrence_l, branching_onset, V
Confidence level used: 0.95
$contrasts
contrast estimate SE df asymp.LCL asymp.UCL z.ratio p.value
LoE183.5 - LoE184.5 -0.00029 0.000211 Inf -0.000704 0.000123 -1.377 0.1684
Results are averaged over the levels of: occurrence_l, branching_onset, V
Confidence level used: 0.95
We find that the estimate for the interaction is not significantly
different from 0 - the p-value is larger than 0.05.
For age : verbal_stm:
plot_model(model_rep, type = "emm", terms = c("age [all]", "verbal_stm"))

list.verbal_stm.red <- list(verbal_stm = seq(median(df_reduced$verbal_stm) - 0.5, median(df_reduced$verbal_stm) + 0.5, by = 1))
emtrends(model_rep, pairwise ~ verbal_stm, var = "age", at = list.verbal_stm.red, adjust = "mvt", infer = c(T, T))
$emtrends
verbal_stm age.trend SE df asymp.LCL asymp.UCL z.ratio p.value
3.5 0.00595 0.0113 Inf -0.0161 0.0280 0.528 0.5976
4.5 0.00917 0.0154 Inf -0.0210 0.0393 0.596 0.5511
Results are averaged over the levels of: occurrence_l, branching_onset, V
Confidence level used: 0.95
$contrasts
contrast estimate SE df asymp.LCL asymp.UCL z.ratio p.value
verbal_stm3.5 - verbal_stm4.5 -0.00322 0.0121 Inf -0.0269 0.0205 -0.266 0.7901
Results are averaged over the levels of: occurrence_l, branching_onset, V
Confidence level used: 0.95
The p-value is much larger than 0.05.
For age : rec_voc:
plot_model(model_rep, type = "emm", terms = c("age [all]", "rec_voc"))

list.rec_voc.red <- list(rec_voc = seq(median(df_reduced$rec_voc) - 0.5, median(df_reduced$rec_voc) + 0.5, by = 1))
emtrends(model_rep, pairwise ~ rec_voc, var = "age", at = list.rec_voc.red, adjust = "mvt", infer = c(T, T))
$emtrends
rec_voc age.trend SE df asymp.LCL asymp.UCL z.ratio p.value
20 0.00751 0.0116 Inf -0.0152 0.0302 0.648 0.5172
21 0.00666 0.0116 Inf -0.0161 0.0294 0.574 0.5662
Results are averaged over the levels of: occurrence_l, branching_onset, V
Confidence level used: 0.95
$contrasts
contrast estimate SE df asymp.LCL asymp.UCL z.ratio p.value
rec_voc20 - rec_voc21 0.00085 0.00156 Inf -0.00221 0.00391 0.544 0.5861
Results are averaged over the levels of: occurrence_l, branching_onset, V
Confidence level used: 0.95
The p-value for the interaction is higher than 0.05.
For verbal_stm : rec_voc:
plot_model(model_rep, type = "emm", terms = c("verbal_stm", "rec_voc"))

list.verbal_stm.red <- list(verbal_stm = seq(median(df_reduced$verbal_stm) - 0.5, median(df_reduced$verbal_stm) + 0.5, by = 1))
emtrends(model_rep, pairwise ~ verbal_stm, var = "rec_voc", at = list.verbal_stm.red, adjust = "mvt", infer = c(T, T))
$emtrends
verbal_stm rec_voc.trend SE df asymp.LCL asymp.UCL z.ratio p.value
3.5 0.0581 0.0281 Inf 0.00301 0.1133 2.067 0.0387
4.5 0.0503 0.0222 Inf 0.00673 0.0938 2.263 0.0236
Results are averaged over the levels of: occurrence_l, branching_onset, V
Confidence level used: 0.95
$contrasts
contrast estimate SE df asymp.LCL asymp.UCL z.ratio p.value
verbal_stm3.5 - verbal_stm4.5 0.00787 0.0329 Inf -0.0565 0.0723 0.240 0.8106
Results are averaged over the levels of: occurrence_l, branching_onset, V
Confidence level used: 0.95
The p-value is once again much larger than 0.05.
Our investigation of the interactions shows that none of them is
statistically significant. A possible option would then be to simplify
the model by dropping these interactions. However, this amounts to model
selection (for the fixed effects), which is warned against by a number
of prominent statisticians. In what follows, we are therefore going to
assess main effects despite the presence of interactions in the
model.
Investigating the
main effects
Given that none of the interactions we thought could be significant
appears to be so, we can focus on the main effects in our model,
i.e. the effects of the item-related categorical variables
occurrence_l, branching_onset, and
V, and the subject-related continous variables
rec_voc, verbal_stm,
LoE, age,
L1_syll_complexity and
phono_awareness.
For occurrence_l:
plot_model(model_rep, type = "emm", terms = "occurrence_l")

summary(emmeans(model_rep, pairwise ~ occurrence_l, adjust = "mvt", side = "<"), infer = c(TRUE, TRUE), null = 0)$contrasts
contrast estimate SE df asymp.LCL asymp.UCL z.ratio p.value
coda - final -0.489 0.526 Inf -Inf 0.6044 -0.930 0.3960
coda - other -1.244 0.398 Inf -Inf -0.4175 -3.129 0.0024
final - other -0.756 0.373 Inf -Inf 0.0199 -2.026 0.0562
Results are averaged over the levels of: branching_onset, V
Results are given on the log odds ratio (not the response) scale.
Confidence level used: 0.95
Conf-level adjustment: mvt method for 3 estimates
P value adjustment: mvt method for 3 tests
P values are left-tailed
We set the parameter side to reflect our hypothesis
that a nonword gets easier to repeat when shifting from l in coda
position to l in final position to another structure.
We find that:
nonwords are significantly more difficult to repeat when l
appears in internal coda position than when there is no l (coda
- other, p = 0.002)
we don’t have evidence that nonwords are significantly more
difficult to repeat when l appears in internal coda position than when l
appears in final position (coda - final, p =
0.396)
there is a strong tendency for nonwords where l appears in final
position to be more difficult to repeat than nonwords where there is no
l (final - other, p = 0.056)
For branching_onset:
plot_model(model_rep, type = "emm", terms = "branching_onset")

summary(emmeans(model_rep, pairwise ~ branching_onset, adjust = "mvt", side = ">"), infer = c(TRUE, TRUE), null = 0)$contrasts
contrast estimate SE df asymp.LCL asymp.UCL z.ratio p.value
branching_onset0 - branching_onset1 0.847 0.184 Inf 0.47667 Inf 4.614 <.0001
branching_onset0 - branching_onset2 1.875 0.499 Inf 0.86886 Inf 3.760 0.0002
branching_onset1 - branching_onset2 1.028 0.506 Inf 0.00778 Inf 2.033 0.0485
Results are averaged over the levels of: occurrence_l, V
Results are given on the log odds ratio (not the response) scale.
Confidence level used: 0.95
Conf-level adjustment: mvt method for 3 estimates
P value adjustment: mvt method for 3 tests
P values are right-tailed
We set the parameter side to reflect our hypothesis
that the more branching onsets in a nonword, the more difficult it is to
repeat.
We observe that all the contrasts are significant, and that
therefore:
- A nonword is more difficult to repeat when it has 1 branching onset
than when it has none (p < 0.001)
- A nonword is more difficult to repeat when it has 2 branching onsets
than when it has none (p < 0.001)
- A nonword is more difficult to repeat when it has 2 branching onsets
than when it has 1 (p = 0.048)
For V:
plot_model(model_rep, type = "emm", terms = "V")

summary(emmeans(model_rep, pairwise ~ V, adjust = "mvt", side = ">"), infer = c(TRUE, TRUE), null = 0)$contrasts
contrast estimate SE df asymp.LCL asymp.UCL z.ratio p.value
V1 - V2 0.0767 0.212 Inf -0.3648 Inf 0.361 0.6670
V1 - V3 0.5764 0.217 Inf 0.1261 Inf 2.660 0.0109
V2 - V3 0.4997 0.215 Inf 0.0528 Inf 2.324 0.0273
Results are averaged over the levels of: occurrence_l, branching_onset
Results are given on the log odds ratio (not the response) scale.
Confidence level used: 0.95
Conf-level adjustment: mvt method for 3 estimates
P value adjustment: mvt method for 3 tests
P values are right-tailed
The parameter side corresponds to the hypothesis
that the less vowels in a nonword, the easier it is to
repeat.
We find two significant differences: a nonword with a single vowel is
easier to repeat than a nonword with 3 vowels (p = 0.011), and
a nonword with 2 vowels is easier to repeat than a nonword with 3 vowels
(p = 0.027). We can’t conclude to any difference between 1 and
2 vowels (p = 0.667)
For rec_voc:
plot_model(model_rep, type = "emm", terms = "rec_voc [all]")

summary(emtrends(model_rep, ~ rec_voc, var = "rec_voc", adjust = "mvt", side = ">"), infer = c(TRUE, TRUE), null = 0)
rec_voc rec_voc.trend SE df asymp.LCL asymp.UCL z.ratio p.value
20.4 0.0552 0.0207 Inf 0.0212 Inf 2.672 0.0038
Results are averaged over the levels of: occurrence_l, branching_onset, V
Confidence level used: 0.95
P values are right-tailed
We set the parameter side to reflect our hypothesis
that the larger a child’s French receptive vocabulary, higher the
probability of correct repetition.
We observe a significant effect for rec_voc
(p = 0.004): the size of a child’s receptive vocabulary
positively impacts her/his ability to correctly repeat nonwords.
For verbal_stm:
plot_model(model_rep, type = "emm", terms = "verbal_stm")

summary(emtrends(model_rep, ~ verbal_stm, var = "verbal_stm", adjust = "mvt", side = ">"), infer = c(TRUE, TRUE), null = 0)
verbal_stm verbal_stm.trend SE df asymp.LCL asymp.UCL z.ratio p.value
3.88 -0.00826 0.123 Inf -0.211 Inf -0.067 0.5267
Results are averaged over the levels of: occurrence_l, branching_onset, V
Confidence level used: 0.95
P values are right-tailed
We set the parameter side to reflect our hypothesis
that the larger a child’s verbal short-term memory, the higher the
probability of correct repetition.
We do not observe a significant effect of
verbal_stm.
For LoE:
plot_model(model_rep, type = "emm", terms = "LoE [all]")

summary(emtrends(model_rep, ~ LoE, var = "LoE", adjust = "mvt", side = ">"), infer = c(TRUE, TRUE), null = 0)
LoE LoE.trend SE df asymp.LCL asymp.UCL z.ratio p.value
184 0.00158 0.00199 Inf -0.0017 Inf 0.791 0.2145
Results are averaged over the levels of: occurrence_l, branching_onset, V
Confidence level used: 0.95
P values are right-tailed
We set the parameter side to reflect our hypothesis
that the longer the exposure to French, the higher the probability of
correct repetition.
We do not observe a significant effect of LoE on the
probability of correct repetition. This may be due, however, to an
inflation of the standard error resulting from multicollinearity.
For age:
plot_model(model_rep, type = "emm", terms = "age [all]")

summary(emtrends(model_rep, ~ age, var = "age", adjust = "mvt", side = ">"), infer = c(TRUE, TRUE), null = 0)
age age.trend SE df asymp.LCL asymp.UCL z.ratio p.value
90.5 0.00716 0.0116 Inf -0.0119 Inf 0.619 0.2681
Results are averaged over the levels of: occurrence_l, branching_onset, V
Confidence level used: 0.95
P values are right-tailed
We set the parameter side to reflect our hypothesis
that the older a child is, the higher the probability of correct
repetition.
We do not observe a significant effect of age
(p = 0.268).
For L1_syll_complexity:
plot_model(model_rep, type = "emm", terms = "L1_syll_complexity [all]")

summary(emtrends(model_rep, ~ age, var = "L1_syll_complexity", adjust = "mvt", side = ">"), infer = c(TRUE, TRUE), null = 0)
age L1_syll_complexity.trend SE df asymp.LCL asymp.UCL z.ratio p.value
90.5 0.159 0.134 Inf -0.062 Inf 1.184 0.1183
Results are averaged over the levels of: occurrence_l, branching_onset, V
Confidence level used: 0.95
P values are right-tailed
We set the parameter side to reflect our hypothesis
that the higher the complexity of the syllables of the L1, the higher
the probability of correct repetition.
We do not observe a significant effect of
L1_syll_complexity (p = 0.118).
For phono_awareness:
plot_model(model_rep, type = "emm", terms = "phono_awareness [all]")

summary(emtrends(model_rep, ~ phono_awareness, var = "phono_awareness", adjust = "mvt", side = ">"), infer = c(TRUE, TRUE), null = 0)
phono_awareness phono_awareness.trend SE df asymp.LCL asymp.UCL z.ratio p.value
-2.82 0.125 0.0683 Inf 0.0131 Inf 1.837 0.0331
Results are averaged over the levels of: occurrence_l, branching_onset, V
Confidence level used: 0.95
P values are right-tailed
We set the parameter side to reflect our hypothesis
that the more developed a child’s phonological awareness, the higher the
probability of correct repetition.
We observe a significant effect of phono_awareness
(p = 0.033): the more developed the child’s phonological
awareness, the better they are at correctly repeating the nonwords.
Summary for the
main effects
All our hypotheses were not confirmed. We found the following
results:
- nonwords are significantly more difficult to repeat when l appears
in internal coda position than when there is no l (p =
0.002).
- the more branching onsets nonwords have, the more difficult they are
to repeat (1 versus 0 BO: p < 0.001; 2 versus 0 BO:
p < 0.001; 2 versus 1 BO: p = 0.048)
- nonwords with 3 vowels are more difficult to repeat than nonwords
with 1 or 2 vowels (p = 0.011 and p = 0.027,
respectively)
- the larger a child’s receptive vocabulary, the easier for them to
repeat the nonwords (p = 0.004)
- the more developed a child’s phonological awareness, the easier for
them to repeat the nonwords (p = 0.033)
Additionally, we observed a statistical tendency: nonwords are
significantly more difficult to repeat when l appears in final position
than when there is no l (p = 0.056).
We did not observe effects of verbal_stm,
age, or LoE. We also did not find a
significant difference between l appearing in internal coda position in
nonwords and l appearing in final position. We finally did not find an
effect of the complexity of syllables in L1.
Our analysis overall proves to be quite simple, in the sense that we
did not observe any significant interaction.
p1 <- plot_model(model_rep, type = "emm", terms = "occurrence_l", show.values = TRUE, value.offset = .3, colors = "gs") +
labs(title = "Effect of occurence_l", x = "Location of /l/", y = "Predicted probability of correct repetition")
p2 <- plot_model(model_rep, type = "emm", terms = "branching_onset", show.values = TRUE, value.offset = .3, colors = "gs") +
labs(title = "Effect of branching_onset", x = "Number of branching onsets", y = "Predicted probability of correct repetition")
p3 <- plot_model(model_rep, type = "emm", terms = "V", show.values = TRUE, value.offset = .3, colors = "gs") +
labs(title = "Effect of V", x = "Number of vowels", y = "Predicted probability of correct repetition")
p4 <- plot_model(model_rep, type = "emm", terms = "rec_voc [all]", show.values = TRUE, value.offset = .3, colors = "gs") +
labs(title = "Effect of rec_voc", x = "Size of the receptive vocabulary", y = "Predicted probability of correct repetition")
p5 <- plot_model(model_rep, type = "emm", terms = "phono_awareness [all]", show.values = TRUE, value.offset = .3, colors = "gs") +
labs(title = "Effect of phonological awareness", x = "Phonological awareness", y = "Predicted probability of correct repetition")
grid.arrange(p1, p2, p3, p4, p5, ncol = 3)
